Piscataway
Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications
Merzky, Andre, Titov, Mikhail, Turilli, Matteo, Kilic, Ozgur, Wang, Tianle, Jha, Shantenu
Hybrid workflows combining traditional HPC and novel ML methodologies are transforming scientific computing. This paper presents the architecture and implementation of a scalable runtime system that extends RADICAL-Pilot with service-based execution to support AI-out-HPC workflows. Our runtime system enables distributed ML capabilities, efficient resource management, and seamless HPC/ML coupling across local and remote platforms. Preliminary experimental results show that our approach manages concurrent execution of ML models across local and remote HPC/cloud resources with minimal architectural overheads. This lays the foundation for prototyping three representative data-driven workflow applications and executing them at scale on leadership-class HPC platforms.
Data-Driven Sequential Sampling for Tail Risk Mitigation
In various operational problems, risk-sensitive decision makers often encounter the challenge of selecting an alternative with minimal tail risk from a collection of stochastic alternatives that generate random losses. Tail risk, in this context, refers to the potential for experiencing substantial losses, which will be formally defined shortly. Despite the significance of addressing this challenge, the majority of related studies still focus on identifying a subset of the alternatives with acceptable (or minimal) expected losses, rather than using tail risk as a ranking criterion. Our objective is to develop a tractable and effective solution to this problem in situations where decision makers aim to compare the alternatives based only on their tail risk. In practical scenarios, it would be ideal to apply our proposed solution to the aforementioned subset of the alternatives, which can be obtained via existing approaches, so that decision makers can ultimately find an alternative with both acceptable expected loss and minimal tail risk.
Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts
However, real-world data often exhibit complex local structures that can be challenging for single-model approaches with a smooth global manifold in the embedding space to unravel. In this work, we conjecture that in the latent space of these large language models, the embeddings live in a local manifold structure with different dimensions depending on the perplexities and domains of the input data, commonly referred to as a Stratified Manifold structure, which in combination form a structured space known as a Stratified Space. To investigate the validity of this structural claim, we propose an analysis framework based on a Mixture-of-Experts (MoE) model where each expert is implemented with a simple dictionary learning algorithm at varying sparsity levels. By incorporating an attention-based soft-gating network, we verify that our model learns specialized sub-manifolds for an ensemble of input data sources, reflecting the semantic stratification in LLM embedding space. We further analyze the intrinsic dimensions of these stratified sub-manifolds and present extensive statistics on expert assignments, gating entropy, and inter-expert distances. Our experimental results demonstrate that our method not only validates the claim of a stratified manifold structure in the LLM embedding space, but also provides interpretable clusters that align with the intrinsic semantic variations of the input data.
Learning To Help: Training Models to Assist Legacy Devices
Machine learning models implemented in hardware on physical devices may be deployed for a long time. The computational abilities of the device may be limited and become outdated with respect to newer improvements. Because of the size of ML models, offloading some computation (e.g. to an edge cloud) can help such legacy devices. We cast this problem in the framework of learning with abstention (LWA) in which the expert (edge) must be trained to assist the client (device). Prior work on LWA trains the client assuming the edge is either an oracle or a human expert. In this work, we formalize the reverse problem of training the expert for a fixed (legacy) client. As in LWA, the client uses a rejection rule to decide when to offload inference to the expert (at a cost). We find the Bayes-optimal rule, prove a generalization bound, and find a consistent surrogate loss function. Empirical results show that our framework outperforms confidence-based rejection rules.
Toward Holistic Planning and Control Optimization for Dual-Arm Rearrangement
Gao, Kai, Ye, Zihe, Zhang, Duo, Huang, Baichuan, Yu, Jingjin
Long-horizon task and motion planning (TAMP) is notoriously difficult to solve, let alone optimally, due to the tight coupling between the interleaved (discrete) task and (continuous) motion planning phases, where each phase on its own is frequently an NP-hard or even PSPACE-hard computational challenge. In this study, we tackle the even more challenging goal of jointly optimizing task and motion plans for a real dual-arm system in which the two arms operate in close vicinity to solve highly constrained tabletop multi-object rearrangement problems. Toward that, we construct a tightly integrated planning and control optimization pipeline, Makespan-Optimized Dual-Arm Planner (MODAP) that combines novel sampling techniques for task planning with state-of-the-art trajectory optimization techniques. Compared to previous state-of-the-art, MODAP produces task and motion plans that better coordinate a dual-arm system, delivering significantly improved execution time improvements while simultaneously ensuring that the resulting time-parameterized trajectory conforms to specified acceleration and jerk limits.
Targeted Parallelization of Conflict-Based Search for Multi-Robot Path Planning
Multi-Robot Path Planning (MRPP) on graphs, equivalently known as Multi-Agent Path Finding (MAPF), is a well-established NP-hard problem with critically important applications. As serial computation in (near)-optimally solving MRPP approaches the computation efficiency limit, parallelization offers a promising route to push the limit further, especially in handling hard or large MRPP instances. In this study, we initiated a \emph{targeted} parallelization effort to boost the performance of conflict-based search for MRPP. Specifically, when instances are relatively small but robots are densely packed with strong interactions, we apply a decentralized parallel algorithm that concurrently explores multiple branches that leads to markedly enhanced solution discovery. On the other hand, when instances are large with sparse robot-robot interactions, we prioritize node expansion and conflict resolution. Our innovative multi-threaded approach to parallelizing bounded-suboptimal conflict search-based algorithms demonstrates significant improvements over baseline serial methods in success rate or runtime. Our contribution further pushes the understanding of MRPP and charts a promising path for elevating solution quality and computational efficiency through parallel algorithmic strategies.
Feynman Diagrams as Computational Graphs
Hou, Pengcheng, Wang, Tao, Cerkoney, Daniel, Cai, Xiansheng, Li, Zhiyi, Deng, Youjin, Wang, Lei, Chen, Kun
We propose a computational graph representation of high-order Feynman diagrams in Quantum Field Theory (QFT), applicable to any combination of spatial, temporal, momentum, and frequency domains. Utilizing the Dyson-Schwinger and parquet equations, our approach effectively organizes these diagrams into a fractal structure of tensor operations, significantly reducing computational redundancy. This approach not only streamlines the evaluation of complex diagrams but also facilitates an efficient implementation of the field-theoretic renormalization scheme, crucial for enhancing perturbative QFT calculations. Key to this advancement is the integration of Taylor-mode automatic differentiation, a key technique employed in machine learning packages to compute higher-order derivatives efficiently on computational graphs. To operationalize these concepts, we develop a Feynman diagram compiler that optimizes diagrams for various computational platforms, utilizing machine learning frameworks. Demonstrating this methodology's effectiveness, we apply it to the three-dimensional uniform electron gas problem, achieving unprecedented accuracy in calculating the quasiparticle effective mass at metal density. Our work demonstrates the synergy between QFT and machine learning, establishing a new avenue for applying AI techniques to complex quantum many-body problems.
Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure
External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC modeling structure. Two GP-based learning controllers are presented by using the EIC structure property. The partial EIC (PEIC)-based control design partitions the robotic dynamics into a fully actuated subsystem and one reduced-order underactuated system. The null-space EIC (NEIC)-based control compensates for the uncontrolled motion in a subspace, while the other closed-loop dynamics are not affected. Under the PEIC- and NEIC-based, the tracking and balance tasks are guaranteed and convergence rate and bounded errors are achieved without causing any uncontrolled motion by the original EIC-based control. We validate the results and demonstrate the GP-based learning control design performance using two inverted pendulum platforms.
Experimental Evaluation of Methods for Estimating Frequency Response Functions of a 6-axes Robot
Zimmermann, Stefanie A., Moberg, Stig
Nonparametric estimates of frequency response functions (FRFs) are often suitable for describing the dynamics of a mechanical system. If treating these estimates as measurement inputs, they can be used for parametric identification of, e.g., a gray-box model. Classical methods for nonparametric FRF estimation of MIMO systems require at least as many experiments as the system has inputs. Local parametric FRF estimation methods have been developed for avoiding multiple experiments. In this paper, these local methods are adapted and applied for estimating the FRFs of a 6-axes robotic manipulator, which is a nonlinear MIMO system operating in closed loop. The aim is to reduce the experiment time and amount of data needed for identification. The resulting FRFs are analyzed in an experimental study and compared to estimates obtained by classical MIMO techniques. It is furthermore shown that an accurate parametric model identification is possible based on local parametric FRF estimates and that the total experiment time can be significantly reduced.